Indexing Text Using the Ziv-Lempel Trie

نویسنده

  • Gonzalo Navarro
چکیده

Let a text of u characters over an alphabet of size be compressible to n symbols by the LZ78 or LZW algorithm. We show that it is possible to build a data structure based on the Ziv-Lempel trie that takes 4n log 2 n(1 + o(1)) bits of space and reports the R occurrences of a pattern of length m in worst case time O(m 2 log(mm) + (m + R) log n).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Indexing Text using the Ziv - Lempel TrieGonzalo Navarro ?

Let a text of u characters over an alphabet of size be compressible to n symbols by the LZ78 or LZW algorithm. We show that it is possible to build a data structure based on the Ziv-Lempel trie that takes 4n log 2 n(1 + o(1)) bits of space and reports the R occurrences of a pattern of length m in worst case time O(m 2 log(mm) + (m + R) log n).

متن کامل

Practical Evaluation of Lempel-Ziv-78 and Lempel-Ziv-Welch Tries

We present the first thorough practical study of the Lempel-Ziv-78 and the Lempel-Ziv-Welch computation based on trie data structures. With a careful selection of trie representations we can beat well-tuned popular trie data structures like Judy, m-Bonsai or Cedar.

متن کامل

Automata on Lempel-ziv Compressed Strings

Using the Lempel-Ziv-78 compression algorithm to compress a string yields a dictionary of substrings, i.e. an edge-labelled tree with an order-compatible enumeration, here called an LZ-trie. Queries about strings translate to queries about LZ-tries and hence can in principle be answered without decompression. We compare notions of automata accepting LZ-tries and consider the relation between ac...

متن کامل

Lempel-Ziv factorization: Simple, fast, practical

For decades the Lempel-Ziv (LZ77) factorization has been a cornerstone of data compression and string processing algorithms, and uses for it are still being uncovered. For example, LZ77 is central to several recent text indexing data structures designed to search highly repetitive collections. However, in many applications computation of the factorization remains a bottleneck in practice. In th...

متن کامل

Ziv Lempel Compression of Huge Natural Language Data Tries Using Suffix Arrays

We present a very efficient, in terms of space and access speed, data structure for storing huge natural language data sets. The structure is described as LZ (Ziv Lempel) compressed linked list trie and is a step further beyond directed acyclic word graph in automata compression. We are using the structure to store DELAF, a huge French lexicon with syntactical, grammatical and lexical informati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002